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Crocodilians are diving reptiles that can hold their breath under water for long periods of time and are crepus- 
cular animals with excellent sensory abilities. They comprise a sister lineage of birds and have no sex chromosome. 
Here we report the genome sequence of the endangered Chinese alligator (Alligator sinensis) and describe its unique 
features. The next-generation sequencing generated 314 Gb of raw sequence, yielding a genome size of 2.3 Gb. A total 
of 22 200 genes were predicted in Alligator sinensis using a de novo, homology- and RNA-based combined model. The 
genetic basis of long-diving behavior includes duplication of the bicarbonate-binding hemoglobin gene, co-function- 
ing of routine phosphate-binding and special bicarbonate-binding oxygen transport, and positively selected energy 
metabolism, ammonium bicarbonate excretion and cardiac muscle contraction. Further, we elucidated the robust 
Alligator sinensis sensory system, including a significantly expanded olfactory receptor repertoire, rapidly evolving 
nerve-related cellular components and visual perception, and positive selection of the night vision-related opsin and 
sound detection-associated otopetrin. We also discovered a well-developed immune system with a considerable num- 
ber of lineage-specific antigen-presentation genes for adaptive immunity as well as expansion of the tripartite motif- 
containing C-type lectin and butyrophilin genes for innate immunity and expression of antibacterial peptides. Mul- 
tifluorescence in situ hybridization showed that alligator chromosome 3, which encodes DMRT1, exhibits significant 
synteny with chicken chromosome Z. Finally, population history analysis indicated population admixture 0.60-1.05 
million years ago, when the Qinghai-Tibetan Plateau was uplifted. 
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Introduction 

The amniotes diverged into mammals and reptiles 
-320 million years ago (Mya) [1]. The early reptiles 
evolved into a diverse fauna, including lizards, snakes, 
turtles, crocodilians, and birds. The body plan of modern 
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crocodilians has remained somewhat unchanged since 
the earliest crocodilians appeared 240 Mya [2, 3], with 
their adaptations to aquatic environments and a diving 
lifestyle. An impressive feature of crocodilians is their 
ability to remain submerged for long periods of time. 
Their diving behaviors are derived from a number of 
traits. First, the crocodilians immerse themselves in the 
water to ambush prey, which they kill by drowning [4]. 
Crocodilians never stray far from the water and usually 
enter the water to escape predation [4]. They are ecto- 
therms that behaviorally regulate their body temperature 
by shuttling between terrestrial and aquatic environments 
and controlling their diving depth [5]. Crocodilians also 
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submerge themselves during social interactions, such as 
mating [6]. 

Monitoring of free dives [7] has shown that the croco- 
dilians exhibit two types of submergence behavior: long 
(~12 min) resting dives, the duration of which depends 
on body mass, and short (~1 min) active dives, which 
vary in length based on the animal's activities. Previous 
data suggest that the external environment is the most 
important determinant in prolonged aerobic dives [7], 
which might be extended to 1-2 h [8]. Crocodilians have 
evolved a series of physical adaptations for diving, in- 
cluding development of a palatal valve at the back of the 
throat to prevent entry of water into the throat, esopha- 
gus, and trachea when the animal is submerged [9]; ear- 
flaps and eye-lids to protect the inner ear and cornea, 
respectively [9]; thickening of the lung wall to resist 
underwater pressure [9]; and increased lung capacity [8]. 
Aerobic dive duration is achieved when oxygen depriva- 
tion prompts the crocodilian heart to stop blood flow to 
the muscles, ensuring aerobic energy expenditure in the 
brain [9]. Switching of cardiac performance is controlled 
by pulmonary oxygen tension [10]. 

Crocodilians are mostly active at dawn and dusk [9, 
1 1 ] and require a powerful sensory system to detect pred- 
ators and prey, sense environmental changes, and engage 
in social interactions. They have a collection of highly 
adapted morphological characteristics for their crepuscu- 
lar lifestyle [9], including two large olfactory lobes in the 
brain, nerve ending-enriched sensory pits on the jaws to 
sense extremely small vibrations, a reflecting layer in the 
eye to improve night vision, and 2-4-fold more auditory 
fibers than birds and mammals [12], contributing to their 
remarkable hearing under water [13, 14]. Crocodilians 
also have a minimal body profile, allowing their sensory 
organs to break the water surface while the remainder of 
their body is hidden from view; this is considered one of 
their most significant adaptations. 

Crocodilians live in marshes, lakes, and rivers, and of- 
ten suffer from serious injuries — males during fights for 
mates and females during battles for nests [9]; however, 
they appear to recover quickly from open wounds in wa- 
ter and are thought to have a robust immune system that 
resists microbial infections [15]. Merchant et al. [16] re- 
vealed that crocodilians generate antimicrobial peptides 
in the blood and possess a powerful first-line defense 
against pathogens in aquatic environments. In addition to 
their physical and physiological adaptations, we expect 
whole-genome analysis to reveal molecular evidence of 
diving, sensory, and immune adaptations in crocodilians. 

Amniotes have evolved diverse sex-determining 
mechanisms: mammals exhibit XY-type genetic sex 
determination (GSD); birds exhibit ZW-type GSD; and 



non-avian reptiles exhibit XY-GSD, ZW-GSD, and 
temperature-dependent sex determination (TSD) [17]. 
Although complete genomes are available for 1 lizard 
[1], 3 birds [18-20], and various mammals, all of these 
species have sex chromosomes. In contrast, crocodilians 
exhibit TSD and do not possess sex chromosomes [21]. 
Therefore, genome sequencing of a crocodilian may pro- 
vide novel insights into sex chromosome evolution. 

There are 23 species of crocodilians, divided into three 
groups: Alligatoridae, Crocodylidae, and Gavialidae 
[22]. The Chinese alligator {Alligator sinensis), a fresh- 
water crocodilian endemic to China, is one of the most 
endangered crocodilian species [23]. Currently, there are 
-100 Chinese alligators in the wild and ~10 000 captive 
individuals in Zhejiang and Anhui Provinces [24]. We 
chose the Chinese alligator for genome sequencing with 
the hope of providing infonnation that could help design 
scientific captive-breeding programs for population re- 
covery project of this endangered species. 

Results 

Assembly and annotation 

We collected a male Chinese alligator from Changx- 
ing Yinjiabian Chinese Alligator Nature Reserve (Zheji- 
ang Province, China) and sequenced its genome using a 
whole-genome shotgun strategy. We obtained 314.03 Gb 
of raw sequence on a next-generation sequencing plat- 
form (IlluminaHiseq 2000). SOAPdenovo [25] was used 
to assemble the genomic sequence, resulting in a 2.3-Gb 
assembly with contig and scaffold N50 values of 23.4 kb 
and 2.2 Mb (Table 1). We assessed the assembly quality 
using bacterial artificial chromosome (BAC) clones and 
found that the scaffolds were reliably assembled with the 
exclusion of the GC-rich and repeat-rich regions, which 
were filled by gaps (Supplementary information, Figure 
SI). 

A total of 22 200 genes were predicted in the alligator 
with integration of de novo prediction, homolog-based 
prediction, and RNA-Seq data (Supplementary infor- 
mation, Table SI); 79.35% of these were functionally 
annotated (Supplementary information, Table S2). We 
then annotated interspersed repeats and found that about 
37.93% of the alligator genome consists of DNA trans- 
posons, long interspersed nuclear elements (LINEs), long 
terminal repeats, and short interspersed nuclear elements 
(Supplementary information, Table S3); the LINEs were 
most abundant, comprising about 29.13% of the genome. 

Genome landscape 

GC content We calculated the GC content of the Chi- 
nese alligator genome and detected a clear preference 
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Table 1 Assembled contigs and scaffolds of the Chinese alligator 
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8,641,424 




Total size 


2,202,897,102 




2,274,864 




Total number (> 100 bp) 




212,160 




41,816 


Total number (> 2 kb) 




138,681 




3,116 



for GC-rich regions, averaging 44.5%; Chinese alligator 
showed the highest GC level among the studied organ- 
isms (Supplementary information, Figure S2). The GC 
patterns of Chinese alligator differed from those of most 
representative animals, but are similar to that of humans 
{Homo sapiens) (Supplementary information, Figure S2), 
with a wider GC range and a lower GC-content peak. 
In contrast, the green anole lizard (Anolis carolinensis) 
and clawed frog (Xenopus tropicalis) yielded narrow 
elevated curves. Previous studies demonstrated that the 
green anole lizard and clawed frog possess an unusu- 
ally homogenous GC distribution, while humans have a 
heterogeneous GC curve [1, 26]. Thus, the Chinese al- 
ligator may have a heterogeneous GC distribution. We 
compared GC content in different regions of the alligator 
genome and found that GC frequencies were highest in 
gene regions (Supplementary information, Table S4). 
GC-rich regions often trigger gene recombination [18], 
suggesting that the Chinese alligator might undergo more 
gene conversion events than other amnio tes. 

Repeat elements We compared transposable elements 
(TEs) of four reptiles — all with abundant LINEs, with 
the alligator showing relatively long LINE members 
(Supplementary information, Table S5). Then, we ana- 
lyzed the accumulation of different TEs based on diver- 
gence from the consensus sequence, which resembles the 
motifs of ancestral repeats [27], and found different pat- 
terns in the Chinese alligator, green anole lizard, chicken 
{Gallus gallus), and zebra finch (Taeniopygia guttata) 
(Supplementary information, Figure S3). The alligator 
genome has accumulated the most highly divergent (old) 
TEs, showing 1 obvious LINE peak at high divergence 
rates of 0.10-0.30. The lizard presented a slight fluctua- 
tion of TE coverage across different divergence rates, 
suggesting the loss of some old TEs and acquisition of 



relatively young TEs in comparison to Chinese alligator. 
The TE coverage of the chicken and zebra finch genomes 
exhibited half of the TE abundance found in the lizard 
and one-third of the alligator TE abundance (Supplemen- 
tary information, Figure S3). As a result, the high ge- 
nome coverage of old LINEs might account for the larger 
genome of Chinese alligator in comparison to the other 
three animals, especially birds. 

Segmental duplication (SD) analysis Genomic se- 
quences of the Chinese alligator, green anole lizard, 
chicken, and zebra finch were subjected to whole- 
genome alignment and SD (length > 1 kb and similarity 
> 90%) analyses to assess genomic features. A total of 
35.90, 201.21, 68.85, and 122.32 Mb non-redundant SD 
blocks were identified in the alligator, lizard, chicken, 
and finch genomes, occupying 2.05%, 11.18%, 6.21%, 
and 9.92% of the genome, respectively. Short read-based 
assembled Chinese alligator genome revealed fewer SD 
blocks than the other three BAC-based assembled ge- 
nomes, possibly due to susceptibility of the short read se- 
quencing to lost assembly of recently duplicated genome 
regions [28]. The four species were uniformly biased 
toward smaller SDs (Supplementary information, Figure 
S4A-S4C), suggesting that the ancestral reptilian genome 
might have been characterized by frequent duplication 
of short segments. The identities of the SD blocks were 
evenly divergent SDs in the lizard, more similar in the 
birds, and more divergent in the alligator (Supplementary 
information, Figure S4D); thus, like the TEs (Supple- 
mentary information, Figure S3), older blocks were re- 
tained in Chinese alligator and more blocks were newly 
duplicated in chicken and zebra finch. 

Genomic alignments between the alligator and chick- 
en showed better synteny than the pairing of alligator and 
lizard (Supplementary information, Figure S5), support- 



www.cell-research.com | Cell Research 



@ The genome of the Chinese alligator 
1094 

ing the notion that Chinese alligator is a close relative 
of birds. The blocks that showed poor synteny between 
chicken and alligator were largely found in regions con- 
taining the most dense SDs, TEs, and small gaps (Sup- 
plementary infonnation, Figure S5A), suggesting that the 
large-scale syntenic breaks between alligator and lizard 
might be due to the relative abundance of SDs and TEs 
in the lizard (Supplementary information, Figure S5B). 
The chicken presented many SD and TE islands with 
heterogeneous distributions (Supplementary information, 
Figure S5A) in contrast to the relatively homogenous SD 
and TE curves in the lizard and alligator (Supplementary 
information, Figure S5B). Consistent isochores between 
TEs and SDs were seen in these species, especially in 
the chicken genome (Supplementary information, Figure 
S5), indicating that the occurrence of SDs was related to 
that of TEs. In view of the most abundant LINEs in the 
reptiles (Supplementary information, Table S5), we fur- 
ther examined the relationship between the SD and LINE 
distributions and found a significant positive correlation (r 
= 0.89; P = 0.001), suggesting that the SD events might 
be triggered by LINEs, which has also been demonstrat- 
ed in mammals [29]. 

The genomes of chicken and green anole lizard have 
been assembled at the chromosomal level (Ensembl 
release 63). Some concordant gaps were seen in the 
distributions of SDs and TEs; these coincided precisely 
with the highest densities of Ns (nucleotides) in the gaps 
(Supplementary information, Figure S5). Furthermore, 
each chromosome of the chicken and green anole lizard 
had only a single SD-TE gap. Centromeres typically 
contain numerous repeats [30], leading to unclosed gaps 
in the assemblies of centromeric DNA regions, as seen in 
the human genome assembly (NCBI Build 37.3). Thus, 
the coincident gaps among the SDs, TEs, and N-filled 
regions of these genomes might represent centromeres. 

Genome adaptive divergence 

We resolved features of genome adaptive divergence 
from an increase in gene copy number, an increase in 
nonsynonymous (d N ) over synonymous {d s ) substitutions, 
and lineage-specific genes. These genetic signatures of 
the Chinese alligator genome were then matched to the 
biological traits of crocodilians. 

Expansion of gene families We employed TreeFam to 
deduce gene clusters from two non-avian reptiles (Chi- 
nese alligator and green anole), three avian reptiles (birds: 
chicken, turkey, and zebra finch), and one mammal (hu- 
man; outgroup species). We found that among the five 
reptiles, the Chinese alligator had developed more unique 
paralogs and unclustered genes (Figure 1A), suggest- 



ing that the alligator has more lineage-specific genomic 
features. From the unique paralogs, we identified 413 
alligator-specific multi-copy gene families (Supplemen- 
tary information, Table S6). Based on the single-copy 
gene families of these six species, we constructed a phy- 
logenetic tree and calculated divergence time using fossil 
records. Our results revealed that the crocodilian lineage 
split from the common ancestor of birds ~24 1 Mya (Fig- 
ure IB). According to divergence times and phylogenetic 
relationships, we adopted CAFE to assess clustering re- 
lationship and discriminated 363 gene families that were 
significantly expanded in the Chinese alligator (P < 0.05) 
(Figure 1C). This result highlights the presence of dis- 
tinct components in the alligator genome. 

Rapid evolution analysis In addition to an expansion 
in paralogs, adaptation features of genome divergence 
usually induce an excess of nonsynonymous over syn- 
onymous substitutions (d N > d s ) at orthologous genes 
[31], which can be identified as positively selected genes 
(PSGs) by the likelihood ratio test and lineage-specific 
accelerated evolving gene ontology (GO) categories by 
the binominal test. We achieved 7 337 strictly filtered 
1:1:1:1:1 orthologous genes in the Chinese alligator, 
green anole lizard, zebra finch, chicken, and turkey (Me- 
leagris gallopavo) genomes and identified 219 PSGs 
(Supplementary information, Table S7) and 86 rapidly 
evolving GO classes (Supplementary information, Table 
S8). 

Lineage-specific gene analysis From the perspective of 
combined paralogous and orthologous genes, adaptive 
differences could be determined by analyzing lineage- 
specific gene pools, as shown in the potato [32]. Compar- 
ative genomic analysis of the alligator and 1 8 representa- 
tive species offish (Tetraodon nigroviridis, Gasterosteus 
aculeatus, Oryzias latipes, Takifugu rubripes, and Danio 
rerio), amphibians (X. tropicalis), mammals {Dasypus 
novemcinctus , Bos taurus, Canis familiaris, Loxodonta 
africana, H. sapiens, Monodelphis domestica, Ornitho- 
rhynchus anatinus, Rattus norvegicus, and Choloepus 
hoffmanni), and reptiles (Anolis carolinensis, T. guttata, 
and G. gallus) revealed 6 122 alligator lineage-specific 
genes (Supplementary information, Table S9), of which 
1 543 (25.20%) are functionally annotated. 

Diving behavior adaptation 

Crocodilians inherited their terrestrial style from the 
ancestral amniotes [33], greatly challenging their diving 
ability. Thus, we examined their genetic signatures from 
multiple angles, including oxygen transport, energy 
supply, urinary excretion, and cardiac muscle contraction 
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Figure 1 Comparisons of orthologous and paralogous genes in the genomes of different species. (A) TreeFam-based clus- 
tering of gene families. (B) Divergence time of six species. (C) Expansion and contraction of CAFE-based gene families. 
(D) Gene family of functional olfactory receptors (ORs). a Mann-Whitney U test, P < 0.01; b Mann-Whitney U test was not per- 
formed because the number of a-ORs was less than 30. 



systems. 

Oxygen transport Hemoglobin (Hb) is responsible for 
oxygen (0 2 ) transport and is composed of two a and two 
P subunits. The phylogenetic tree of reptilian Hbs shows 
that the Chinese alligator has four crocodilian-specific 
Hb genes: 1 a (HBA1) and 3 (3 (HBB2, HBB4, and HBB5) 
(Supplementary information, Figure S6). The transcrip- 
tome and proteome data demonstrate that these genes are 
all highly expressed in the blood (Figure 2A). Previous 
studies demonstrated that crocodilian hemoglobin had 
been mutated from a routine bisphosphoglycerate-bind- 
ing type to a special bicarbonate (HC0 3 ~)-binding form 
to increase 0 2 release [34]. The Hb sequence alignment 
shows that the alligator HBA1, HBB2, and HBB4 are 
HCCV-binding subunits but HBB5 is a routine P subunit 
(Figure 2A). Thus, our results reveal that in the Chinese 
alligator: (1) the HCCV-binding P gene has been dupli- 
cated once; (2) it possesses routine phosphate-binding 
and special HC0 3 ~-binding double Hb effectors; and (3) 
the only HBA1 subunit simultaneously participates in the 



assembly of two types of Hb molecules (Figure 2A). We 
examined PSGs and found that anion exchanger 1, which 
facilitates 0 2 unloading in the standard erythrocyte path- 
way [35, 36], has undergone positive selection (Supple- 
mentary information, Table S7), presenting evidence 
of active routine 0 2 transport. Therefore, our study has 
identified multiple unique 0 2 transport pathways in the 
Chinese alligator (Figure 2B). 

Energy supply system We enriched KEGG pathways 
for PSGs and found that alligator PSGs are mostly 
metabolism-related (Supplementary information, Table 
S10). In particular, the oxidative phosphorylation (OX- 
PHOS) in charge of energy production is overrepresented 
in the Chinese alligator (Supplementary information, Ta- 
ble S10). We then used iPath [37] to visualize the mutual 
relationship of PSGs in metabolic pathways and found 
that the PSGs were obviously focused on the glycan, 
fatty acid, terpenoids, and OXPHOS metabolic pathways 
(Figure 2C). Of the 28 iPath-mapped PSGs, 24 are di- 
rectly and indirectly part of OXPHOS metabolism at the 
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Figure 2 Diving adaptations in the Chinese alligator genome. (A) Alignment of hemoglobin genes. Asterisk represents HC0 3 "- 
binding sites [34]. CA, AA, NC, and SC represent the Chinese alligator, American alligator, Nile crocodile, and Spectacled 
caiman. (B) Standard and HC0 3 "-binding 0 2 transportation pathways [36]. (C) Positively selected genes (PSGs) have been 
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mitochondrial inner membrane (Figure 2D). The most 
remarkable result is that two ATP synthases (ATPeFOB 
and ATPeVAC39) have undergone positive selection in 



Chinese alligator (Figure 2D). The rapidly evolving GO 
results indicate that the "ATP catabolic process", "phos- 
phorylation", "glucose homeostasis", "mitochondrial 
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inner membrane", "mitochondrion", "ATPase activity", 
and "heme binding" GO categories have experienced 
strong selective pressure (Figure 3 and Supplementary 
information, Table S8). Consequently, positive selection 
in the energy metabolism-related genes and GO classes 
suggests special energy demand in alligators during div- 
ing. 

Urinary excretion Amniotes remove carbon dioxide 
(C0 2 ) by lung ventilation [38]. As no gas exchange is 
possible during a dive, crocodilians secrete ammonium 
bicarbonate (NH 4 HC0 3 ) in the urine as the major excre- 
tory route [9]. We surveyed the genetics of the excre- 
tory system and discovered that the ammonium (NH 4 + ) 
transporter (PF00909) and HC03 transporter (PF00955) 
families are overrepresented in the PSGs (Supplementary 
information, Table Sll). Furthermore, the PSG-enriched 
pathways, cyanoamino acid metabolism (map 00460) 
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and glutathione metabolism (map 00480) (Supplementary 
information, Table S10), generate formamide and gluta- 
mate, which are the precursors of ammonia (NH 3 ) (map 
00910; Figure 2E). CA4 (carbonic anhydrase IV), which 
binds the plasma membrane of the tubular lumen [39] 
and controls urine HC0 3 concentrations by catalyzing 
the reversible dehydration of carbonic acid (Figure 2E), 
and RHCG (Rhesus blood group, C glycoprotein), which 
excretes NH 4 + in kidney tubules and regulates body acid- 
base balance [40], have both undergone positive selec- 
tion in Chinese alligator (Supplementary information, 
Table S7). These results provide evidence for the positive 
selection of NH 4 HC0 3 secretion through the urinary sys- 
tem in the alligator. 

Cardiac muscle contraction Once oxygen depletion 
occurs during submergence, the crocodilians slow their 
heart rate and supply oxygenated blood only to the brain 
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[9]. The alligator lineage-specific rapidly evolving GO 
results demonstrate that the diving hypoxia adaptation- 
related categories, including the "response to hypoxia", 
"heart development", "response to stress", and "heat 
shock protein binding" (Figure 3B and 3D), have un- 
dergone fast evolution in the Chinese alligator. We 
further examined the PSG evidence for the crocodilian 
cardiovascular system and found strengthened striated 
muscles, showing four PSGs related to the assembly of 
sarcomeres (Figure 2F). Mapping of PSGs to the cardiac 
muscle contraction pathway (map 04260) indicates that 
the Na + /K + -ATPase (3 subunit (ATP IB) is a PSG (Figure 
2F and Supplementary information, Table S7) that con- 
trols the repolarization/relaxation of the cardiac muscle 
[41]. Furthermore, two other PSGs, SCN4B and KCNJ8 
(Supplementary information, Table S7), have also been 
associated with heart repolarization [42, 43]. Thus, these 
positive selection (d N /d s test) results suggest that a lon- 
ger resting state and slower heartbeat of robust cardiac 
muscle may meet the special demand of minimizing 
metabolic rate during alligator diving. 

Sensory system signatures 

We extracted molecular evidence of an excellent olfac- 
tory ability in the Chinese alligator genome. First, the al- 
ligator lineage-specific rapidly evolving GO results show 
that the "receptor activity", "transporter activity", and 
"ion channel activity" have experienced rapid evolution 
(Figure 3D). We then enriched the GO domains of the 
CAFE-based gene families and found that the "receptor 
activity", "transmembrane signaling receptor activity", 
and "olfactory receptor (OR) activity" genes are overrep- 
resented (all P = 0.0000) in the molecular function cat- 
egory of the Chinese alligator (Supplementary informa- 
tion, Table S12). We thus re-annotated the OR families 
for the Chinese alligator, green anole lizard, zebra finch, 
chicken, turkey, and human and found that the Chinese 
alligator developed the most ORs (Supplementary infor- 
mation, Table SI 3). We performed a CAFE-based analy- 
sis of these OR gene families and found that the Chinese 
alligator expanded the most gene families and contracted 
the fewest ORs (Supplementary information, Figure 
S7). We then built a phylogenetic tree for the ORs of 
six species and found that these ORs are grouped into a 
single-copy 9 basal branch and two multiple-copy a and 
y clusters (Supplementary information, Figure S8). This 
clustering relationship indicated that the alligator pos- 
sessed the most abundant functional y- and a-type ORs 
relative to other reptiles and a similar number of ORs to 
human (Figure ID). The y- and a-type ORs bind airborne 
and water-soluble odorant molecules [44], respectively, 
and the a-type ORs in the human are putative relics of 



ancestral tetrapods [45]. Thus, we calculated the d n /d s 
values of the alligator y- and a-ORs relative to its 9 gene 
and found that Chinese alligator presented a significantly 
higher d N /d s ratio in the y set than a (Figure ID; Mann- 
Whitney U test, P = 0.0082). Hence, the large quantity 
and high selective pressure (djd s ) of y ORs suggest that 
the Chinese alligator relies heavily on airborne odorant 
detection. 

In addition, we identified genetic signatures from the 
alligator nervous system. Two synaptic genes (SYT11 and 
NLGN3) [46, 47] have been positively selected (Supple- 
mentary information, Table S7) and neurological system 
process (GO 0050877) was the most overrepresented bi- 
ological process (BP) among the alligator-specific genes 
(Supplementary information, Table S14). Finally, four 
nerve-related cellular components (CCs) have rapidly 
evolved, including the "neuronal cell body", "dendrite", 
"synapse", and "synaptosome" (Figure 3C). 

We also obtained evidence of enhanced auditory and 
ocular systems in the Chinese alligator. We found that 
visual perception has rapidly evolved in the alligator 
(Figure 3B), and opsin (OPN3) and otopetrin (OTOP1) 
have been positively selected (Supplementary informa- 
tion, Table S7). Opsin senses light [48] and is associated 
with dark adaptation [49], whereas the otopetrin triggers 
development of the otolith in fish [50], which is impor- 
tant for detecting sounds under water [51]. 

Features of the immune system 

The single-copy genes in the "immune response" GO 
category have undergone fast evolution in the Chinese 
alligator (Figure 3B). The alligator lineage-specific genes 
indicate that the BP GO classes of "antigen process- 
ing and presentation" (0019882), "defense response" 
(0006952), "immune response" (0006955), and "im- 
mune system process" (0002376) are overrepresented 
(Supplementary information, Table SI 4), suggesting that 
the Chinese alligator genome features antigen-triggered 
adaptive immunity. The major histocompatibility com- 
plex (MHC) is responsible for antigen presentation and 
is divided into class 1 and class If molecules [52]. The 
adaptive immune system of the Chinese alligator was 
characterized by the lineage-specific class 1 MHC genes, 
as shown by enriched CC categories of GO 0042612 
(MHC class 1 protein complex) and GO_004261 1 (MHC 
protein complex) (Supplementary information, Table 
S14). 

The alligator-specific gene families show that the 
BP categories of "immune response" (GO 0006955), 
"immune system process" (GO 0002376), and "innate 
immune response" (GO 0045087) are overrepresented 
(Supplementary information, Table SI 5), reflecting 
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strong innate immunity in the Chinese alligator. We then 
carried out Ensembl-based classification of immune gene 
families from different animals and found that 43 had ex- 
panded in the alligator relative to the other reptiles (Fig- 
ure 4). The first-, second-, and third-rank gene families 
are the tripartite motif (TRIM)-containing, C-type lectin 
(CLEC), and butyrophilin, respectively (Figure 4). The 
TRIM superfamily is a versatile effector in innate immu- 
nity and participates in resistance to different pathogens, 
especially lentiviruses such as HIV [53]. The CLEC is 
a major receptor on the natural killer cells that regulate 
innate immunity [54], while butyrophilin is a regula- 
tor responding to inflammation [55]. Thus, the obvious 
expansions of the TRIM, CLEC, and butyrophilin gene 
families in Chinese alligator suggest its strong innate im- 
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munity function. 

Crocodilian blood kills bacteria in vitro and it is 
thought to possess antimicrobial activity in vivo [56, 
57]. We scanned for genes that intersected in the blood 
transcriptome and proteome and found that cathelicidin 
(PF00666), a major class of antimicrobial peptides [58], 
is overrepresented in the blood (Supplementary informa- 
tion, Table SI 6). Thus, the results suggest that alligator 
blood carries an effective system with non-specific de- 
fense against microbial infection. 

Our study derived three significant findings from the 
Chinese alligator immune system, including evolution 
of alligator-specific genes related to adaptive immunity, 
expansion of genes related to innate immunity, and ex- 
pression of antibacterial peptides in the blood, which in- 
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dicate that Chinese alligator possesses a well-developed 
immune defense system. 

Sex chromosome evolution and DMRT1 alternative 
splicing analyses 

Sex chromosome evolution Reptiles show XY-type 
GSD, ZW-type GSD, and TSD sex determinant mecha- 
nisms [17]. We sequenced genital gland transcriptomes 
and found 8 743 differentially expressed genes (DEGs) 
between the ovary and testis of the Chinese alligator 
(Supplementary information, Figure S9). We extracted 
the top 20% DEGs in the Chinese alligator gonads and 
performed chromosomal assignment of the orthologs in 
humans (Supplementary information, Figure SI OA) and 
chickens (Supplementary information, Figure SI OB). 
Apart from the genes located on the autosomal chromo- 
somes, other orthologs were uniformly allocated to the 
human X and chicken Z chromosomes (Supplementary 
information, Figure S10). We then adopted the ortholo- 
gous genes to all of the DEGs (Figure 5A) to assess their 
expression profiles in the ovary and testis of the alligator, 
human, and chicken (Figure 5B and 5C). We found that 
the testis produced a slight difference (Mann- Whitney 
U test, P = 0.5988) between the alligator and chicken 
expression profiles, while all other pairwise compari- 
sons revealed significant differences (Mann- Whitney U 
test, all P < 0.05), suggesting that cognate genes in the 
Chinese alligator may resemble the ZW system of the 
chicken. 

The sex determination system of the chicken is ZZ for 
males and ZW for females; the expression dosage of the 
DMRT1 (doublesex and mab-3 related transcription fac- 
tor 1) gene located on the Z chromosome controls testis 
development and the W-located genes influence ovary 
development [59, 60]. Two W-located genes, ASW (avian 
sex-specific W-linked) and FET1 (female expressed tran- 
script 1), are expressed specifically in the embryonic go- 
nads of female chickens [59]. The ASW was also called 
HINTW (W-linked histidine triad nucleotide binding pro- 
tein) due to the presence of its homologous copy {HINTZ) 
on the Z chromosome [61]. We annotated 1 DMRT1, 1 
HINT, and 130 FET1 genes from the alligator genome, 
and then constructed phylogenetic trees for the HINT 
and FET1 genes for classification. The results revealed 
that the alligator HINT homolog should be HINTZ, as it 
clustered with the chicken HINTZ gene (Supplementary 
information, Figure Sll). Alligator FET1 genes also dif- 
fered from chicken FET1 genes because they grouped 
with the non-ovary-specific FET1 genes of the chicken 
(Supplementary information, Figure S12). We scanned 
the BAC library for the single-copy DMRT1 and HINTZ 
and obtained two BACs, 316D7 (-120 kb) for DMRT1 



and 324H6 (~80 kb) for HINTZ, for fluorescence in situ 
hybridization (FISH). The FISH results show that the 
DMRTI and HINTZ genes are located on the p-arm of 
chromosome 3 of the Chinese alligator (Supplementary 
information, Figure SI 3). We performed synteny analy- 
sis between the alligator scaffolds and the chicken Z 
chromosome and chose the three largest syntenic blocks 
from scaffolds 560 1, 573_1, and 240 1 (Supplementary 
information, Figure S14), where we again selected the 
three most significant DEGs of genital glands (farnesyl- 
transferase, FNTA; glutathione peroxidase, GPx; terminal 
uridylyltransferase, TUT) for BAC library scanning. We 
finally obtained 877H6 (-70 kb) for the FNTA, 785B7 
(-125 kb) for the GPx, and 1638C10 (-120 kb) for the 
TUT, and subjected them to multi-color FISH (M-FISH), 
together with the 316D7 of DMRTI and the 324H6 of 
HINTZ to examine inter-chromosome synteny. The male 
and female M-FISHs present identical assignments of 
the five BACs to chromosome 3 (Figure 5D and 5E) and 
a perfect synteny to the chicken Z chromosome (Figure 
5F). Therefore, alligator chromosome 3 and chicken 
chromosome Z shared an ancestral chromosome. 

DMRTI splice variant The key sex determination gene, 
DMRTI, exhibits sex-specific alternative splicing [62, 
63]. In this study, we observed 10 alternatively spliced 
DMRTI variants in the genital transcriptome sequences 
of the Chinese alligator due to exon skipping and intron 
inclusion, six of which were specific to the testis (Figure 
5G). Comparisons of the DMRTI genes of the Chinese 
alligator, chicken [62], and Mugger crocodile (Crocodylus 
palustris) [63] revealed large discrepancies in the lengths 
of the DMRTI genomic and coding sequences; the Chi- 
nese alligator DMRTI is composed of five exons and 
occupies a 100-kb genomic fragment, presenting a sharp 
contrast to the Mugger crocodile DMRTI, which con- 
sisted of three exons spanning only 4 kb (Supplementary 
information, Figure S15A). This extreme diversification 
of the DMRTI structure was accompanied by various 
alternative splicing options such as the inclusion of in- 
trons, exon skipping, the occurrence of pre-stop codons, 
and frame-shift mutations (Supplementary information, 
Figure S15B-S15D). The DMRTI gene is well-known 
for its DM {doublesex and mab-3 related) domain, which 
is a cysteine-rich DNA-binding motif first recognized in 
proteins encoded by the Drosophila sex determination 
gene, doublesex (DSX) [64]. The DSX gene undergoes 
sex-specific alternative splicing, and the resultant male- 
and female-specific isoforms direct male and female de- 
velopment in the fruit fly [64]. In addition to the DM do- 
main, the DMRTI gene usually contains another DMRTI 
domain (NCBI CDD pfaml2374) [62]. Alignment of the 
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alternatively spliced variants of DMRT1 showed that all 
isoforms harbored the DM domain, and allowed us to 
identify alternative splicing hotspots at the ends of the 
DM and DMRT1 domains across different reptiles (Sup- 
plementary information, Figure SI 6). Of the six testis- 
specific isoforms (e-j) of the Chinese alligator DMRT1, 
four isoforms (g-j) contain the DM domain but not the 
DMRT1 domain (Supplementary information, Figure 
SI 6). Furthermore, no DM-only DMRT1 isoforms were 
expressed in the ovary (Figure 5G), suggesting a DM- 
biased genital expression profile in Chinese alligator. 

Previous studies have shown that the DM-only genes 
act as independent sex-determining factors in some spe- 
cies, such as the W-linked female-specific DMW in the 
African clawed frog X. laevis [65], and the Y-linked 
male-specific DMY/DMRT1Y in the medaka Oryzias 
latipes [66]. Comparisons of DMRT1 genes between 
two TSD crocodilians revealed that a nearly identical 
DMRT1 isoform was shared by the Chinese alligator and 
the Mugger crocodile (Alligator sinensis DMRTlg and C. 
palustris DMRTlb) (Supplementary information, Figure 
SI 6). Interestingly, DMRTlb is the only isoform in the 
Mugger crocodile that contains the DM domain alone 
(Supplementary information, Figure SI 6). Furthermore, 
an alternative splicing study identified chicken DM- 
only DMRT1-V4 and found that it was the only isoform 
specific to male embryonic gonads [62]. Thus, we named 
the alligator DMRTlg isoform "DMZ" due to its inclu- 
sion of only the DM domain and its high similarity to 
the chicken Z-linked DMRT1. As the W-located DMW in 
frogs and the Y-located DMY in fishes trigger sex differ- 
entiation, the expression bias and splicing site similari- 
ties between Chinese alligator DMZ, Mugger crocodile 
DMRTlb, and chicken DMRT1-V4 suggest that DMZ 



may play an important role in the sex determination of 
the Chinese alligator. 

Single nucleotide polymorphism (SNP) and population 
history analyses 

The Chinese alligator is an endangered species, mak- 
ing its population history another issue of interest in con- 
servation biology. Moreover, the sequenced individual 
was collected from the severely bottlenecked Changxing 
Chinese alligator population, which developed from 1 1 
founders in 1979 [67]. This severe population bottleneck 
was examined in this study. We aligned clean reads to 
the genome sequence and identified 318 283 SNPs. The 
heterozygosity rate was 0.15 x 10" 3 , which is much lower 
than those of the green anole, chicken, and human (Figure 
6A). The SNP heterozygosities of the coding sequences 
(CDS) and intronic regions were similar in the Chinese 
alligator, whereas the ratios were all approximately 0.5 
in the lizard, chicken, and human (Figure 6A), suggest- 
ing a rapid loss of genetic variation in the non-coding 
sequences of the Chinese alligator. Furthermore, the SNP 
curve of the Chinese alligator depicted a continuously 
elevated number of homozygous SNPs while others en- 
tered the descending phase (Supplementary information, 
Figure S17). These results provide evidence for the on- 
going bottleneck in the Chinese alligator. 

Based on the SNP data, we estimated the population 
history of Chinese alligator (and that of the chicken, 
green anole, and human) by using the pairwise sequen- 
tially Markovian coalescent (PSMC) model, which has 
been used to deduce human population history [68]. The 
alligator PSMC curve depicts a unique increase in its ef- 
fective population size (Ne) between 0.60 and 1.05 Mya, 
when other tested species present a consistent decline 
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Figure 6 Single nucleotide polymorphism (SNP) (A) and population history (B) analyses. The shaded area represents the 
time span of the Qinghai-Tibetan Plateau uplift. The effective population sizes of the Chinese alligator, green anole and hu- 
man are indicated on the left; the chicken is on the right (Supplementary information, Figure S18). 
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in population size (Supplementary information, Figure 
SI 8). In the human PSMC curve, the younger peak may 
correspond to an increase in Ne induced by population 
separation and subsequent admixture [68]. The Chinese 
alligator lives in the Yangtze River, the third longest 
river in the world. The source of the Yangtze River lies 
in the Tanggula Mountains of the Qinghai-Tibetan Pla- 
teau, which experienced widespread and rapid uplifting 
between 0.6 and 1.1 Mya [69]. The fossil record indi- 
cates that the Chinese alligator once resided in Xinjiang 
Province of the Qinghai-Tibetan region [11]. Thus, the 
concordance between the Ne increase and the Qinghai- 
Tibetan Plateau uplift (Figure 6B) suggests that Chinese 
alligators living in the upper Yangtze River would have 
been forced to swim toward the middle-lower Yangtze. 
The resulting gene exchange between the upper and low- 
er stream alligators would explain the enhanced Ne in the 
range of 0.60-1.05 Mya. 

Discussion 

The amniotes diverged from the tetrapods -340 Mya 
[70] and attained the ability to inhabit terrestrial environ- 
ments. Many mammals and reptiles then returned to the 
water and regained adaptations to aquatic life. Croco- 
dilians are semi-aquatic reptiles with unique diving, 
sensory, and immune adaptations. The Chinese alligator 
genome sequence has unraveled the genetic basis of sec- 
ondary aquatic adaptations in the circulatory, metabolic, 
excretory, cardiac, olfactory, nervous, ocular, auditory, 
and the innate and adaptive immune systems, presenting 
evidence for co-evolution of multiple systems specific to 
the back-to-the water transition. Thus, this study provides 
a good example of how terrestrial-style reptiles adapt to 
aquatic environments. 

Hypoxia usually refers to passive low 0 2 at high alti- 
tude [31, 71] or in aquatic conditions [72]. Aerobic div- 
ing is a distinctive form of voluntarily tolerant hypoxia. 
Crocodilians have perfect diving ability, which can help 
them to resist atmospheric hypoxia [9]. Consequently, 
the unique molecular signatures of the alligator diving 
adaptation provide a new perspective into hypoxia resis- 
tance. 

The Chinese alligator genome is the first complete 
crocodilian genome to become available, and this will 
provide a comparative genomic target for deducing the 
characteristics of the ancestral reptilian genome and root- 
ing the complicated phylogeny of birds. The alligator is 
also the first TSD species whose genome has been se- 
quenced; therefore, it fills an important gap in resolving 
sex chromosome evolution. 

In addition, this single Chinese alligator genome pro- 
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vides a snapshot of the 1 million-year-old population de- 
mography of the Chinese alligator species and effectively 
captures molecular evidence of past geophysical events 
and historical gene flow. Genome sequencing of the 
Chinese alligator provides a valuable resource for future 
efforts in designing better strategies to help protect this 
endangered reptile. 

Materials and Methods 

All samples for genome and transcriptome sequencing were 
provided by the Changxing Yinjiabian Chinese Alligator Nature 
Reserve. Illumina sequencing and the SOAPdenovo algorithm 
were used to assemble the Chinese alligator genome. Sequence 
similarity at the nucleotide and protein levels was applied to iden- 
tify repeat sequences in Chinese alligator. Genes were annotated 
based on the repeat-masked genome sequence using ab initio, ho- 
mology- and RNA-based gene prediction models. Genome adap- 
tive features were extracted from gene families, d N /d s tests, and 
lineage-specific genes. Sex chromosome evolution was analyzed 
by synteny comparison and M-FISH. Population history was re- 
constructed from SNP data. 

Full Materials and Methods are provided in Supplementary in- 
formation, Data S 1 . 
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